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Abstract. The Anji Salamander, Hynobius amjiensis Gu, 1992, is a critically endangered amphibian only known from a 
few mountain tops in the Zhejiang and Anhui Provinces of China. Along with tremendous efforts to breed this species in 
captivity, there have been attempts to understand the causes for its endangered status from a genetic perspective, which 
was limited to a few markers from earlier studies. Here we used next-generation sequencing technology on the DNASEQ 
platform to investigate the characteristics of the salamander’s genome. Based on k-mer analysis, the 19-mer frequency dis- 
tribution yielded the optimal estimation that suggested a genome size of ~17.54 Gb, 70.77% of which consist of repetitive 
sequences. Filtered sequences were assembled in 8,852,165 contigs with N50 at 1052 bp; GC rate was 45.3%. We identified 
1,441,045 microsatellite loci across the assembled partial genome. Mono-nucleotide microsatellites accounted for 61.4% of 
all loci, and 2 to 6-base repeat motifs are present at a frequency of 19.59.67.91.5and 0.06%, respectively. PCR primers were 
developed for 98 microsatellite loci with 2 to 6-base motifs. Our work provides an overview of the genome characteristics 
of H. amjiensis that can serve subsequent studies on the evolutionary history of this endangered salamander. 
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Introduction 


The Anji Salamander, Hynobius amjiensis (Fig. 1), belongs 
to the Caudata family Hynobiidae and is found in the 
Sphagnum moss and herbage of the montane swamps on 
Longwang Mountain (Gu 1992). Being endemic to China, it 
has since been classified in the top ten critically endangered 
amphibians by the International Union for Conservation of 
Nature (IUCN), and as critically endangered in the Red List 
of China’s Biodiversity Amphibians (JIANG & XIE 2015). It 
has also been given Class I protection according to the most 
recent list of Wild Animals under Special State Protection 
in China in 2021. This salamander is currently only known 
from four neighbouring sites in China, which are Long- 
wang Mountain in Anji, Qingliangfeng, and Baizhangling 
Mountain in Linan, Zhejiang Province, and on Anhui 
Qingliangfeng Mountain, Anhui Province.Here, increasing 
human activities and adverse changes in climate have been 
making survival ever more difficult for it, as both lead to the 
shrinking of Sphagnum moss-covered swanps and a reduc- 
tion in the number of breeding sites. As a result, the repro- 
ductive population of H. amjiensis has constantly been de- 
clining. Since the estimated effective population comprises 


merely 600 adults (CHEN et al. 2016), it is necessary to study 
the mechanisms causing its endangerment and manage and 
if possible expand the suitable habitat for this population. 
To the latter end, Chinese authorities have set up the Anji 
Salamander National Nature Reserve in which the Anji 
Salamander may find a less disturbed habitat. Meanwhile, 
throughout this reserve, we devised and started off multiple 
strategies to help this population recover, including protec- 
tion and restoration of habitat alongside with captive breed- 
ing. However, various major questions have as yet remained 
unsolved. For instance, how large is the genome of H. am- 
jiensis, what kind of genetic structure exists in the popula- 
tion, and what is the number of actual subpopulations, and 
to what extent is gene exchange still taking place, to name 
only some. Answers to these key questions should be help- 
ful to the protection of H. amjiensis. 

Due to its rarity, only a few groups of researchers have 
thus far been able to conduct basic studies on Hynobius 
amjiensis, such as breeding characteristics (Gu et al. 1999) 
and the influence of size class, population density, and food 
availability during its larval development (Fu et al. 2003, 
Fu et al. 2003b). Also, YE (2012) studied the ecological 
factors of its microhabitat that had the largest impact on 
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winter habitat selection. YANG et al. (2016) were the first 
to study the population structure and genetic diversity of 
H. amjiensis and found no evidence of geographic parti- 
tioning between populations; Bayesian skyline plots also 
revealed no dramatic change in population size (YANG et 
al. 2016). However, this work was limited to mitochondri- 
al DNA data, which had minimal intraspecific divergence 
and thus were not suitable for a more in-depth study of 
population demographics. It was only recently that Kan 
et al. (2021) used double-digest restriction-site associated 
DNA (ddRAD) sequencing to develop 33 single nucleotide 
polymorphism (SNP) markers, which are useful for genetic 
assessments at population level and hence the conservation 
of species (KAN, 2022, KAN et al. 2021), including the one 
dealt with here. However, these works were limited to sec- 
tional DNA data, which had minimal intraspecific diver- 
gence and thus were not suitable for a more in-depth study 
of population demographics either. 
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With the advent of next-generation sequencing, ge- 
nome scanning of non-model organisms has become more 
feasible and less expensive. This approach can provide an 
overview of the whole genome of the study organism and 
build a foundation for subsequent genomic studies. Be- 
cause the genome of Hynobius amjiensis has not been as- 
sessed to date, we performed a genome scan using k-mer 
analysis to generate a genomic reference for this critically 
endangered salamander. We compared its estimated ge- 
nome size to those of other species of the orders Caudata 
and Anura. We also recovered more than one million mi- 
crosatellite markers, which constitute powerful indicators 
that can be used for both historical and contemporary es- 
timations of population demographics (CHor et al. 2021). 
Microsatellite analysis can reveal the evolutionary and eco- 
logical causes that have led to the current endangered sta- 
tus of H. amjiensis and these important insights may then 
help with its protection. 


Sirenidae 


v 
© 
E 
5 

° 

gS 

> 
= 


Cryptobranchidae 
Ambystomatidae 


Dicamptodontidae 
Salamandridae 
Proteidae 
Rhyacotritonidae 
Amphiumidae 
Plethodontidae 


Family 


Figure 1. Genome size estimation (gigabases) using a k-mer approach in Hynobius amjiensis. Genome size estimates were retrieved from 
www.genomesize.com (last accessed on 8 April 2022). The inset at the top depicts a specimen of H. amjiensis, Photo: CANGSONG CHEN). 
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Materials and methods 
Sample collection and DNA extraction 


To avoid injuring breeding adults of H. amiiensis, sever- 
al larvae were collected for obtaining tissue samples. They 
were caught in a swamp at Qianmutian on Longwang 
Mountain during the breeding season and stored in the 
freezer. We subsequently used 200 mg of tissue from one 
larva (AA0000810M1, vouchered at the Zhejiang Museum 
of Natural History) that was cut up into pieces and incu- 
bated with 1 ml lysis buffer containing 2 mg proteinase K at 
56°C for 1-2 hours. DNA was extracted as per the phenol/ 
chloroform/isopentanol (25:24:1) protocol by BGI Genom- 
ics, China, and eluted in 30 ul TE buffer. DNA quality 
was analyzed through electrophoresis on 1.5% of agarose 
(PowerPac, Bio-rad, US). DNA quantity was measured by 
an ultraviolet-visible spectrophotometer (NanoDrop 2000, 
Thermo Scientific, USA) to ensure samples contained at 
least 2 ug of genomic DNA. 


Library preparation and sequencing 


A DNA library was compiled according to the standard pro- 
tocol of the PCR-free DNBSEQTM library (BGI Genom- 
ics, China). Briefly, 1 ug of genomic DNA was randomly 
sheared with the Covaris System (Woburn, Massachusetts, 
USA) and size-selected with the Agencourt AMPure XP- 
Medium (Beckman Coulter Inc., California, USA) kit to an 
average size of 300-400 bp. Selected fragments were end- 
repaired and 3’adenylated, then the adaptors were ligated to 
the ends of these 3'adenylated fragments. DNA fragments 
were then purified by Agencourt AMPure XP-Medium kit 
and heat-denatured into single strands and circularized 
by the splint oligo sequence. The single-stranded circular 
DNA (ssCir DNA) was formatted as the final library, form- 
ing DNA nano-balls (DNB) and undergoing rolling-circle 
amplification (RCA). The final qualified libraries were se- 
quenced with BGISEQ-500. Finally, pair-end 100-bp read- 
ings were obtained by combined Probe-Anchor Synthesis 
(cPAS) at BGI Genomics, China. 


K-mer analysis, genome assembly, and 
microsatellite analysis 


Prior to the k-mer analysis, raw readings were filtered to re- 
move sequencing adapters, contamination, and low-qual- 
ity readings (e.g., ambiguous characters “N” and readings 
with more than 10% Q < 20). Filtering was conducted with 
SOAPnuke 1.5.6  (https://github.com/BGIflexlab/SOAP- 
nuke), applying the following parameters: -n 0.01 -l 20 -q 
0.1 -i -Q 2 -G -M 2 -A 0.5 -d. Filtered readings were k-mer 
analyzed using Jellyfish (MARÇAIS & KINGSFORD 2011) and 
GenomeScope (VURTURE et al. 2017). We then assembled 
the genome with MaSuRCA (ZıMın et al. 2013) and used 
MISA v2.1 to identify microsatellite loci, which were called 
only when the motif was repeated more than ten times for 


a single base, more than six times for two bases, and more 
than five times for 3-6 bases. 


Primer design and validation 


To develop PCR primers for identified microsatellite loci, 
we first filtered out mono-nucleotide repeat motifs and any 
motifs that appeared fewer than 10 times in the assembly. 
We then discarded microsatellites near the end of the con- 
tigs and must have at least 150 bp in up- and downstream 
sequences, which ensured enough sequences for identify- 
ing primer locations. We furthermore filtered out micro- 
satellites that were situated within 100 bp of each other, be- 
cause these loci are essentially linked and can violate the 
linkage disequilibrium in subsequent analyses. Primers 
were developed as per the Primer3 tool (p3_in.pl, p3_out. 
pl, primer3-2.5.0) incorporated in MISA v2.1 (http://pgrc. 
ipk-gatersleben.de/misa/). Lastly, we used PANDAseq 
(MASELLA et al. 2012) to confirm that designed primer se- 
quences were present in the assembly. From the final prim- 
er pairs, 24 pairs were experimentally validated with PCR 
prior to using them further. Each test was performed in 
triplicate. The volume of each PCR reaction system was 20 
ul, comprising 1 ul genomic DNA, 10 pl 2 x Super PCR Mix 
Pro, 0.5 ul each of forward and reverse primers, and 8 ul 
ddH,O. The PCR reaction program was set up under the 
following PCR conditions: 95°C for 4 min; 95°C for 30 s, 
57°C for 30 s, and 72°C for 30 s for 35 cycles; and 72°C for 
7 min. The PCR products were separated using 4% agarose 
gel electrophoresis, and the BGI D2000 Plus DNA ladder 
(BGI, China) was used to estimate sizes. 


Results and discussions 
Sequencing data summary 


DNA Nano Ball-sequencing (DNB-seq) generated 
595.45 Gb of raw data for Hynobius amjiensis. After filter- 
ing, we obtained 2,372,006,372 clean readings for reading 1 
and 1,386,500,244 clean readings for readings 2 that totalled 
563.78 Gb of filtered data. The GC content of clean readings 
was estimated to be 46.84%, which is within the range of 
30-50% that is widely accepted (CHEUNG et al. 2011, ZHOU 
et al. 2013, SHANGGUAN et al. 2013). 


Genome assessment and k-mer analysis 


We performed k-mer analysis by evaluating seven k-mer 
values from 19 to 31. Results show that k-mer = 19 pro- 
duced the optimal estimate (Table 1, Fig. 2) (MARÇAIS & 
KINGSFORD 2011), because it had the lowest error rate at a 
heterozygosity rate close to 1. The estimated genome size 
of Hynobius amjiensis was about 17.54 Gb (read depth = 
32X), which is smaller than in other salamander species for 
which genome sequencing data is available. For example, 
Ambystoma mexicanum (Ambystomatidae) has a genome 


119 


Table 1. K-meer statistics. 


KAIYANG CHEN et al. 


K-mer n k-mer Used base Genome size Heter rate Repeat rate Err rate Depth 
19 486,539,139,441 562,097,417,035 17,543,380,785 1.01% 70.77% 0.3% 32.04 
21 480,620,686,817 561,735,433,384 17,746,536,093 1.04% 63.87% 0.36% 31.65 
23 474,361,759,556 561,656,713,342 17,804,958,084 1.02% 60.73% 0.38% 31.54 
25 467,859,957,717 561,635,983,299 17,845,504,353 0.99% 58.35% 0.38% 31.47 
27 461,183,308,916 561,627,746,531 17,873,834,662 0.96% 56.23% 0.38% 31.42 
29 454,362,831,498 561,626,872,679 17,895,949,323 0.94% 54.3% 0.38% 31.38 
31 447,426,522,039 561,629,855,054 17,911,969,466 0.91% 52.54% 0.38% 31.36 


of about 32 Gb (KEINATH et al. 2015), and Pleurodeles waltl 
(Salamandridae) one of about 20 Gb (ELEwa et al. 2017). 
However, our estimate is in line with those for other spe- 
cies of the genus Hynobius, which likewise have relatively 
small genomes in terms of C-values, i.e., between 16.16 and 
20.45 pg (Fig. 1) (OLMO 1973), equivalent to 15.8 to 20 Gb 
(assuming 1 pg = 978 Mb). 

Amongst the ten recognized families in the order Cau- 
data, C-values estimated by feulgen densitometry or flow 


cytometry (www.genomesize.com, last accessed on 8 April 
2022) showed that the average genome size is 35.3 pg. How- 
ever, there is considerable variation, ranging from 10.1 pg 
in Gyrinophilus porphyriticus (Plethodontidae) (Gorn et 
al. 1968) to as much as 120.6 pg in Necturus lewisi (Pro- 
teidae) (OLMO 1973). Although the Hynobiidae is an old 
phylogenetic group, genome size seems not to correlate 
with its age. Indeed, this family generally has a smaller ge- 
nome compared to the other nine families. The other basal 


GenomeScope Profile 
len:17,543,380,785bp uniq:29.2% het:1.01% kcov:12.9 err:0.298% dup:0.576% k:19 


(00) 
la) 
+ 
D 
© 
o 
œ 
ro! 
+ 
D 
= 
S q 
Cc 
D 
m] 
o> 
D 
u 
L © 
ò 
+ 
o 
oO 
ames 
o 
o 
+ 
D 
= 
o 


observed 

full model 

unique sequence 
errors 
kmer-peaks 


40 50 60 70 


Coverage 


Figure 2. K-mer analysis (K = 19) of Hynobius amjiensis. The x-axis represents coverage, and the y-axis the frequency at each depth. 
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Table 2. Six repeat types of microsatellite statistics. salamanders acquired such large genomes. Having a large 
genome could benefit the organism by having extra genet- 
ic material for mutation, which increases the likelihood of 
adaptation through genetic selection. 


Repeat type Number Proportion (%) Average distance (bp) 


1 genie pral uae Another notable feature of the Hynobius amijiensis ge- 
2 281559 19.54 128.09 nome is the high level (70.77%) of repetitive sequences. In 
3 138721 9.63 256.27 contrast, Ambystoma mexicanum is estimated to have only 
4 113496 7.88 469.16 about 40% of its genome composed of repetitive sequences 
5 21431 1.49 358.73 (KEINATH et al. 2015). After genome assembly, we obtained 
6 827 0.06 200.97 8.85 million contigs, and the longest contig approached 


62.15 kb. The N50 was 1052 bp, and L50 was 2190151 bp. The 
total length of assembled contigs was 7.7 Gb, representing 
43.9% of the genome. 
lineage in the order Caudata, Cryptobranchidae, has much 
large genome sizes of around 50 pg (GREGORY 2022). On 
the other hand, frogs and toads in the order Anura mostly Microsatellite discovery and primer design 
possess small genomes of less than 10 pg, and many spe- 
cies have a genome size of as little as about 5 pg (GREGoRY Among microsatellites with motif lengths between 1-6 
2022). It has been known that polyploidy can dramatically bases, there were 1,441,045 loci identified in the assembled 
increase genome size in plants, but it remains unclear how.” partial genome of Hynobius amiiensis, the total length of 


AC/GT 
= CA/TG 
A = GA/TC 
L 45.6% aT 13.3% a AG/CT 
a C = AT/AT 
2G = TAMTA 
= GC/GC 
= CG/CG 
AAT/ATT CTCA/TGAG 
= TAA/TTA = GTGA/TCAC 
= ATA/TAT 27.86% g~ = ACTC/GAGT 
= TCA/TGA = AGTG/CACT 
= ATG/CAT = AATA/TATT 
= ATC/GAT = TGAA/TTCA 
= CCA/TGG = AATC/GATT 
= CAG/CTG *% a = ATTC/GAAT 
= GGATCC °% Y = ATAA/TTAT 
= CTC/GAG aim = ATCA/TGAT 
= AGG/CCT i -= ATGA/TCAT 
hii = Other a AO = Other 
CTCAA/TTGAG i GGGTTA/TAACCC 
= TAACA/TGTTA = ACTGAG/CTCAGT 
= AGTTG/CAACT = CTGAGA/TCTCAG 
= ATAAC/GTTAT = CTCAGA/TCTGAG 
= AACAT/ATGTT = ACCCTA/TAGGGT 
19.7% = CATAA/TTATG =m CCCTAA/TTAGGG 
= ACTCA/TGAGT = ACTCAG/CTGAGT 
= GTTGA/TCAAC „sx = CCTAAC/GTTAGG 
= ACATA/TATGT = CTAACC/GGTTAG 
DoS = TAAGA/TCTTA = AACCCT/AGGGTI 
= AATAA/TTATT = ATAAAC/GTTTAT 
= Other = Other 


Figure 3. Frequencies of microsatellites in Hynobius amiiensis. 
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Table 3. Overview of bands: A-H represent plate rows, and 1-9 represent plate columns for each primer pair tested in this study. AJx 
means one primer pair, and -1, -2, 3 specify the repeat groups. 


1 2 3 4 5 6 7 8 9 

A AJ3-1 AJ6-3 AJ11-2 AJ23-1 AJ44-3 AJ47-2 AJ86-1 AJ88-3 AJ95-2 
B AJ3-2 AJ8-1 AJ11-3 AJ23-2 AJ45-1 AJ47-3 AJ86-2 AJ89-1 AJ95-3 
C AJ3-3 AJ8-2 AJ21-1 AJ23-3 AJ45-2 AJ48-1 AJ86-3 AJ89-2 AJ96-1 
D AJ5-1 AJ8-3 AJ21-2 AJ24-1 AJ45-3 AJ48-2 AJ87-1 AJ89-3 AJ96-2 
E AJ5-2 AJ10-1 AJ21-3 AJ24-2 AJ46-1 AJ48-3 AJ87-2 AJ94-1 AJ96-3 
F AJ5-3 AJ10-2 AJ22-1 AJ24-3 AJ46-2 AJ85-1 AJ87-3 AJ94-2 AJ97-1 
G AJ6-1 AJ10-3 AJ22-2 AJ44-1 AJ46-3 AJ85-2 AJ88-1 AJ94-3 AJ97-2 
H AJ6-2 AJ11-1 AJ22-3 AJ44-2 AJ47-1 AJ85-3 AJ88-2 AJ95-1 AJ97-3 


Figure 4. Bands showing PCR test results for the 24 primer pairs used in PCR amplifications for Hynobius amjiensis using 4% agarose 
gel electrophoresis. There are 72 bands, including three missing ones. The mean primer AJ10 (E2, F2, G2) proved useless. 
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which represented 0.12% of the whole genome. Genome 
assembly data that support the findings of this study have 
been deposited at the CNGB Sequence Archive (CNSA) 
(Guo et al. 2020) of the China National GeneBank Data- 
Base (CNGBdb) (CHEN et al. 2020) under accession num- 
ber CNPo002813. In comparison, microsatellites make up 
a much greater percentage in some other animals, such as 
3% in human (SuBBaya et al. 2003), 2.85% in Mus muscu- 
lus (TONG et al. 2006), 1.41% in Rattus norvegicus (Tu et 
al. 2015), and 0.77% in the Japanese puffer fish (Takifugu 
rubripes) (Cut et al. 2006). It is likely that the low preva- 
lence of microsatellites in H. amjiensis is associated with 
low genetic diversity in this species. Amongst those mi- 
crosatellites, mono-nucleotide motifs accounted for 61.4% 
(885,011) of identified loci, followed by 19.5% (281,559) di- 
nucleotide repeat motifs, 9.6% (138,721) tri-nucleotide re- 
peat motifs, and 7.9% (113,496) tetra-nucleotide repeat mo- 
tifs (Table 2). Penta- and hexa-nucleotide motif repeats 
occurred at much lower frequencies. We identified 4, 12, 
60, 212, 514, and 368 types of microsatellites for mono-, di-, 
tri-, tetra-, penta-, and hexa-nucleotide motif repeats, re- 
spectively. The reason for fewer types of hexa- than penta- 
nucleotide microsatellites is likely due to the lower fre- 
quency of, and an increased mutation rate associated with, 
longer motifs (KATTI et al. 2001). Microsatellite loci with 
the highest frequencies are illustrated in Figure 3. Interest- 
ingly, the majority of microsatellites are heavily biased to- 
wards nucleotides A and T. After excluding mono-nucle- 
otide repeats, the most frequent repeat motif was the AC/ 
GT one. The average number of repeats per microsatellite 
loci, relative frequency (RF), and relative abundance (RA) 
were summarized for loci that had at least 700 copies in 
the assembly (Supplementary Table S1). Lastly, after ap- 
plying the filtering criteria, we obtained 98 microsatellite 
markers with repeat motifs between 2-6 bases, and primer 
pairs were designed for those markers (Supplementary Ta- 
ble S2). Those included 8, 35, 32, 15, and 8 primer pairs for 
di-, tri-, tetra-, penta-, and hexa-nucleotide motif repeats, 
respectively. Our newly designed primers can be used to 
assess the population structure and evolutionary history of 
H. amjiensis, but it is likely that they may also work for 
other species of Hynobius. From these primer sets, we ran- 
domly selected 24 primer pairs for PCR amplification with 
H. amijiensis, 23 of which produced one clear band (Fig. 4, 
Table 3); this will help us to clone microsatellite loci genes 
for future studies, for instance, for parentage tests. 


Conclusions 


For the first time, we assessed the genomic characteristics 
of Hynobius amjiensis, a critically endangered salaman- 
der endemic to a small area in eastern China. Its genome 
size was estimated to be 17.54 Gb with 70.77% as repeti- 
tive sequences. We further assembled 7.7 Gb of its genome 
sequences, which can serve as a foundation for future 
genomic studies to understand the small population size 
as well as how to conserve this evolutionary lineage that is 


facing extinction. A total of 98 pairs of microsatellite prim- 
ers were designed. Our work thus provides very important 
guidelines and methods to assess the population structure 
and demographic history of H. amjiensis and possibly oth- 
er species of this genus. 
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The following data are available online: 


Supplementary Table S1. Characteristic statistics of microsatellite 
that had at least 700 copies. 


Supplementary Table S2. PCR primers for a selection of micro- 
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